Search CORE

138 research outputs found

Stereo Vision and Its Application to Robotic Manipulation

Author: Jun Takamatsu
Publication venue: 'IntechOpen'
Publication date: 19/07/2011
Field of study

Estimation of a focused object using a corneal surface image for eye-based interaction

Author: Ogasawara Tsukasa
Takamatsu Jun
Takemura Kentaro
Yamakawa Tomohisa
Publication venue: University of Bern
Publication date: 27/03/2014
Field of study

Researchers are considering the use of eye tracking in head-mounted camera systems, such as Google’s Project Glass. Typical methods require detailed calibration in advance, but long periods of use disrupt the calibration record between the eye and the scene camera. In addition, the focused object might not be estimated even if the point-of-regard is estimated using a portable eye-tracker. Therefore, we propose a novel method for estimating the object that a user is focused upon, where an eye camera captures the reflection on the corneal surface. Eye and environment information can be extracted from the corneal surface image simultaneously. We use inverse ray tracing to rectify the reflected image and a scale-invariant feature transform to estimate the object where the point-of-regard is located. Unwarped images can also be generated continuously from corneal surface images. We consider that our proposed method could be applied to a guidance system and we confirmed the feasibility of this application in experiments that estimated the object focused upon and the point-of-regard

Journal of Eye Movement Research

BOP Serials

Interactive Task Encoding System for Learning-from-Observation

Author: Ikeuchi Katsushi
Kanehira Atsushi
Sasabuchi Kazuhiro
Takamatsu Jun
Wake Naoki
Publication venue
Publication date: 25/01/2023
Field of study

We introduce a practical pipeline that interactively encodes multimodal human demonstrations for robot teaching. This pipeline is designed as an input system for a framework called Learning-from-Observation (LfO), which aims to program household robots with manipulative tasks through few-shots human demonstration without coding. While most previous LfO systems run with visual demonstration, recent research on robot teaching has shown the effectiveness of verbal instruction in making recognition robust and teaching interactive. To the best of our knowledge, however, no LfO system has yet been proposed that utilizes both verbal instruction and interaction, namely \textit{multimodal LfO}. This paper proposes the interactive task encoding system (ITES) as an input pipeline for multimodal LfO. ITES assumes that the user teaches step-by-step, pausing hand movements in order to match the granularity of human instructions with the granularity of robot execution. ITES recognizes tasks based on step-by-step verbal instructions that accompany the hand movements. Additionally, the recognition is made robust through interactions with the user. We test ITES on a real robot and show that the user can successfully teach multiple operations through multimodal demonstrations. The results suggest the usefulness of ITES for multimodal LfO. The source code is available at https://github.com/microsoft/symbolic-robot-teaching-interface.Comment: 7 pages, 10 figures. Last updated January 24st, 202

arXiv.org e-Print Archive

GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System

Author: Ikeuchi Katsushi
Kanehira Atsushi
Sasabuchi Kazuhiro
Takamatsu Jun
Wake Naoki
Publication venue
Publication date: 10/05/2023
Field of study

This technical paper introduces a chatting robot system that utilizes recent advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT. The system is integrated with a co-speech gesture generation system, which selects appropriate gestures based on the conceptual meaning of speech. Our motivation is to explore ways of utilizing the recent progress in LLMs for practical robotic applications, which benefits the development of both chatbots and LLMs. Specifically, it enables the development of highly responsive chatbot systems by leveraging LLMs and adds visual effects to the user interface of LLMs as an additional value. The source code for the system is available on GitHub for our in-house robot (https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation) and GitHub for Toyota HSR (https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures)

arXiv.org e-Print Archive

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Author: Ikeuchi Katsushi
Kanehira Atsushi
Sasabuchi Kazuhiro
Takamatsu Jun
Wake Naoki
Publication venue
Publication date: 01/01/2023
Field of study

This paper demonstrates how OpenAI's ChatGPT can be used in a few-shot setting to convert natural language instructions into an executable robot action sequence. The paper proposes easy-to-customize input prompts for ChatGPT that meet common requirements in practical applications, such as easy integration with robot execution systems and applicability to various environments while minimizing the impact of ChatGPT's token limit. The prompts encourage ChatGPT to output a sequence of predefined robot actions, represent the operating environment in a formalized style, and infer the updated state of the operating environment. Experiments confirmed that the proposed prompts enable ChatGPT to act according to requirements in various environments, and users can adjust ChatGPT's output with natural language feedback for safe and robust operation. The proposed prompts and source code are open-source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-PromptsComment: 17 figures. Last updated April 11th, 202

arXiv.org e-Print Archive

Directory of Open Access Journals

Bounding Box Annotation with Visible Status

Author: Katayama Hiroki
Kiyokawa Takuya
Shirakura Naoki
Takamatsu Jun
Tomochika Keita
Publication venue
Publication date: 10/04/2023
Field of study

Training deep-learning-based vision systems requires the manual annotation of a significant amount of data to optimize several parameters of the deep convolutional neural networks. Such manual annotation is highly time-consuming and labor-intensive. To reduce this burden, a previous study presented a fully automated annotation approach that does not require any manual intervention. The proposed method associates a visual marker with an object and captures it in the same image. However, because the previous method relied on moving the object within the capturing range using a fixed-point camera, the collected image dataset was limited in terms of capturing viewpoints. To overcome this limitation, this study presents a mobile application-based free-viewpoint image-capturing method. With the proposed application, users can collect multi-view image datasets automatically that are annotated with bounding boxes by moving the camera. However, capturing images through human involvement is laborious and monotonous. Therefore, we propose gamified application features to track the progress of the collection status. Our experiments demonstrated that using the gamified mobile application for bounding box annotation, with visible collection progress status, can motivate users to collect multi-view object image datasets with less mental workload and time pressure in an enjoyable manner, leading to increased engagement.Comment: 10 pages, 16 figure

arXiv.org e-Print Archive